Section 1: Data preparation (Import, transform, sort and filter)


Import libraries

library(tidyverse)
library(stringr)
library(ggpubr)
library(knitr)

Define paths to data sets

If you don’t keep your data in the same directory as the code, adapt the path names.

dir1 <- "~"
dir2 <- "Desktop"
dir3 <- "AC"
dir4 <- "Useful"
dir5 <- "Carrer"
dir6 <- "Skills "
dir7 <- "3. Skills para trabajo"
dir8 <- "10. R data science, statistics, machine learning"
dir9 <- "Portfolio analysis" 
dir10 <- "3. Marketing Analytics" 
file_name  <- "Data"
PSDS_PATH <- file.path(dir1, dir2, dir3, dir4, dir5, dir6, dir7, dir8, dir9, dir10, file_name)

Import csv files as data frames

Data <- read_csv(file.path(PSDS_PATH, 'ml_project1_data.csv'))

Sort ascending according to ID

Data<- arrange(Data,ID)

Transform Year_Birth to Age

It would be useful to have a feature with the age of the clients.

As the last date in the data is on june of 2014, we are going to assume that we are performing this analysis in 2014 for the age calculations.

Data <- Data %>%
  mutate(Age = 2014 - Year_Birth)

Section 2: Data process (clean)


Explore the data frame

str(Data)
## tibble [2,240 × 30] (S3: tbl_df/tbl/data.frame)
##  $ ID                 : num [1:2240] 0 1 9 13 17 20 22 24 25 35 ...
##  $ Year_Birth         : num [1:2240] 1985 1961 1975 1947 1971 ...
##  $ Education          : chr [1:2240] "Graduation" "Graduation" "Master" "PhD" ...
##  $ Marital_Status     : chr [1:2240] "Married" "Single" "Single" "Widow" ...
##  $ Income             : num [1:2240] 70951 57091 46098 25358 60491 ...
##  $ Kidhome            : num [1:2240] 0 0 1 0 0 0 1 1 0 1 ...
##  $ Teenhome           : num [1:2240] 0 0 1 1 1 1 0 1 1 0 ...
##  $ Dt_Customer        : Date[1:2240], format: "2013-05-04" "2014-06-15" ...
##  $ Recency            : num [1:2240] 66 0 86 57 81 91 99 96 9 35 ...
##  $ MntWines           : num [1:2240] 239 464 57 19 637 43 185 18 460 32 ...
##  $ MntFruits          : num [1:2240] 10 5 0 0 47 12 2 2 35 1 ...
##  $ MntMeatProducts    : num [1:2240] 554 64 27 5 237 23 88 19 422 64 ...
##  $ MntFishProducts    : num [1:2240] 254 7 0 0 12 29 15 0 33 16 ...
##  $ MntSweetProducts   : num [1:2240] 87 0 0 0 19 15 5 2 12 12 ...
##  $ MntGoldProds       : num [1:2240] 54 37 36 8 76 61 14 6 153 85 ...
##  $ NumDealsPurchases  : num [1:2240] 1 1 4 2 4 1 2 5 2 3 ...
##  $ NumWebPurchases    : num [1:2240] 3 7 3 1 6 2 6 3 6 2 ...
##  $ NumCatalogPurchases: num [1:2240] 4 3 2 0 11 1 1 0 6 2 ...
##  $ NumStorePurchases  : num [1:2240] 9 7 2 3 7 4 5 4 7 3 ...
##  $ NumWebVisitsMonth  : num [1:2240] 1 5 8 6 5 4 8 7 4 6 ...
##  $ AcceptedCmp3       : num [1:2240] 0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp4       : num [1:2240] 0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp5       : num [1:2240] 0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp1       : num [1:2240] 0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp2       : num [1:2240] 0 1 0 0 0 0 0 0 0 0 ...
##  $ Complain           : num [1:2240] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_CostContact      : num [1:2240] 3 3 3 3 3 3 3 3 3 3 ...
##  $ Z_Revenue          : num [1:2240] 11 11 11 11 11 11 11 11 11 11 ...
##  $ Response           : num [1:2240] 0 1 0 0 0 0 0 0 0 1 ...
##  $ Age                : num [1:2240] 29 53 39 67 43 49 38 54 56 27 ...

Categorial feature options

#Unique categories in each categorical column
unique(Data$Education)
## [1] "Graduation" "Master"     "PhD"        "2n Cycle"   "Basic"
unique(Data$Marital_Status)
## [1] "Married"  "Single"   "Widow"    "Divorced" "Together" "Alone"    "YOLO"    
## [8] "Absurd"

As it can observed, there are no trailing or leading spaces, misspelings, or blank spaces, therefore, this portion of the data is cleaned.

About the data

Exploring the features characteristics
Group Range Data_Type
ID 0-11191 Categorical nominal
Age 18-121 Numeric discrete
Income 1730-666666 Numeric continuous
Kidhome 0-2 Numeric discrete
Teenhome 0-2 Numeric discrete
Dt_Customer 2012-07-30 to 2014-06-29 Numeric discrete
Recency 0-99 Numeric discrete
MntWines 0-1493 Numeric continuous
MntFruits 0-199 Numeric continuous
MntMeatProducts 0-1725 Numeric continuous
MntFishProducts 0-259 Numeric continuous
MntSweetProducts 0-263 Numeric continuous
MntGoldProds 0-362 Numeric continuous
NumDealsPurchases 0-15 Numeric discrete
NumWebPurchases 0-27 Numeric discrete
NumCatalogPurchases 0-28 Numeric discrete
NumStorePurchases 0-13 Numeric discrete
NumWebVisitsMonth 0-20 Numeric discrete
AcceptedCpm1 0-1 Categorical nominal
AcceptedCpm2 0-1 Categorical nominal
AcceptedCpm3 0-1 Categorical nominal
AcceptedCpm4 0-1 Categorical nominal
AcceptedCpm5 0-1 Categorical nominal
Complain 0-1 Categorical nominal
Z_CostContact 3-3 Not sure
Z_Revenue 11-11 Not sure
Response 0-1 Categorical nominal
Education - Categorical nominal
Marital Status - Categorical nominal

It can be observed that the ranges of numerical data are as expected, therefore, this portion of the data is cleaned.

Duplicates

duplicates <- duplicated(Data$ID)
num_true <- sum(duplicates)
print(num_true)
## [1] 0
remove(duplicates,num_true)

We can conclude that there are no duplicates.

Section 3: Exploratory Data Analysis


Before we start, it would be useful to know how many Responses we had in our test.

We know that there are 2240 observations and that Response is a binary datatype (0 or 1).

table(Data$Response)
## 
##    0    1 
## 1906  334

Categorical features exploration

I’ll start by exploring the categorical features (Education and Marital_Status).


Education

Observations for level of education

We noticed that there is a pattern worth studying forward:

*People with PhD’s tend to respond better to the ad.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


Marital Status

Observations for Marital Status

We noticed that there is a pattern worth studying forward:

*When customers are not in a relationship (“Single”, “Widow”, “Divorced”) their response to the ad increases.

*The opposite effect is true also, when customers are in a relationship (“Married” or “Toghether”) their respond to the ad decreases.

In the next section (Statistical analysis), we will determine wether these differences are statistical significant or not to decided if we can use it in our model to predict the outcome.


Complain

Observations for Complain

We noticed that there is a pattern worth studying forward:

*People who complain are way more likely to not respond to campaign 6.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


Response to previous camapaigns vs Response to target campaign

Observations for response to previous campaigsn

We noticed that there is a pattern worth studying forward:

*People that responded positively to our previous campaigns are more likely to respond positively to campaign 6.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.

Numerical discrete features exploration

Now it’s time to study the numerical discrete features (Kidhome, Teenhome, AcceptedCpm[1,2,3,4,5], Complain), these can be explored similarly to the categorical features.


Kids Home

Observations for Kids Home

We noticed that there is a pattern worth studying forward:

*The more kids people have, the less likely they are to respond to our ads.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


Teens Home

Observations for Teens Home

We noticed that there is a pattern worth studying forward:

*The more teens people have, the less likely they are to respond to our ads.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


Age

We notice that there are a few outliers above 100, probably these people found the cure to death, but just to be sure we decided to exclude them.

Data <- Data %>%
  filter(Age < 100)

Observations for Age

As it is observed in the boxplot, there’s a minimal difference in age for the people who had a positive response to our ad vs those who didn’t.

Therefore, we will not pursue a statistical analysis nor use this feature in the model to predict the response.


Recency

Observations for Recency

We noticed that there is a pattern worth studying forward:

*People with lower recency tend to respond better to the ad.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


NumDealsPurchases

We notice that there are a few outliers above 7.5, we decided to exclude them.

Data <- Data %>%
  filter(NumDealsPurchases < 7.5)

Observations for NumDealsPurchases

As it is observed in the boxplot, there’s a minimal difference in NumDealsPurchases for the people who had a positive response to our ad vs those who didn’t.

Therefore, we will not pursue a statistical analysis nor use this feature in the model to predict the response.


NumWebPurchases

We notice that there are a few outliers above 15, we decided to exclude them.

Data <- Data %>%
  filter(NumWebPurchases < 15)

Observations for NumWebPurchases

We noticed that there is a pattern worth studying forward:

*People with higher NumWebPurchases tend to respond better to the ad.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


NumCatalogPurchases

We notice that there are a few outliers above 15, we decided to exclude them.

Data <- Data %>%
  filter(NumCatalogPurchases < 15)

Observations for NumCatalogPurchases

We noticed that there is a pattern worth studying forward:

*People with higher NumCatalogPurchases tend to respond better to the ad.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


NumStorePurchases

Observations for NumStorePurchases

As it is observed in the boxplot, there’s a minimal difference in NumStorePurchases for the people who had a positive response to our ad vs those who didn’t.

Therefore, we will not pursue a statistical analysis nor use this feature in the model to predict the response.


NumWebVisitsMonth

We notice that there are a few outliers above 12, we decided to exclude them.

Data <- Data %>%
  filter(NumWebVisitsMonth < 12)

Observations for NumWebVisitsMonth

We noticed that there is a pattern worth studying forward:

*The distribution for people who response better to the ad is wider, therefore the observations that are closer to the tales can help our model predict better the outcome.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


Dt_Customer

Observations for Dt_Customer

We noticed that there is a pattern worth studying forward:

*The farther the day of enrollment with the company, the better the clients respond to the ad.

In the next section (Statistical analysis), we will determine wether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.

Numerical features exploration

Now it’s time to study the numerical features


Income

We notice that there are a few outliers above 140000, we decided to exclude them:

Data <- Data %>%
  filter(Income  < 140000)

Observations for Income

We noticed that there is a pattern worth studying forward:

*The higher the client’s income, the better the clients respond to the ad.

In the next section (Statistical analysis), we will determine whether this difference is statistical significant or not to decided if we can use it in our model to predict the outcome.


MntWines


MntFruits


MntMeatProducts


MntFishProducts


MntSweetProducts


MntGoldProds

Observations for Mnt’s

We noticed that there is a pattern worth studying forward:

*Overall, the more the clients spend in any category of products, the more likely they are to respond better to our ad.

In the next section (Statistical analysis), we will determine whether these differences are statistical significant or not to decided if we can use it in our model to predict the outcome.

Section 4: Statistical Analysis


Before we start the statistical tests, we created a table defining which variables we are going to perform the tests on.

Which features are we going to keep analyzing?
Group Pursue_a_statistical_analysis
ID NO
Age NO
Income YES
Kidhome YES
Teenhome YES
Dt_Customer YES
Recency YES
MntWines YES
MntFruits YES
MntMeatProducts YES
MntFishProducts YES,
MntSweetProducts YES
MntGoldProds YES
NumDealsPurchases NO
NumWebPurchases YES
NumCatalogPurchases YES
NumStorePurchases NO
NumWebVisitsMonth YES
AcceptedCpm1 YES
AcceptedCpm2 YES
AcceptedCpm3 YES
AcceptedCpm4 YES
AcceptedCpm5 YES
Complain YES
Z_CostContact NO
Z_Revenue NO
Education YES
Marital Status YES

Filter by Response

Also, it would be useful to filter the Data by response for the tests.

Data_0 <- Data %>%
  filter(Response == 0)
nrow(Data_0)
## [1] 1830
Data_1 <- Data %>%
  filter(Response == 1)
nrow(Data_1)
## [1] 323

The Data_1 dataframe has 323 observations, and the data_0 dataframe has 1830.

Permutation Tests for numerical continuous features

In this test we are going to compare the difference of the median for the Response (0 vs 1) to determine whether our findings are statistical significant or random chance.

To avoid making the analysis longer, we are just going to show the calculations for the first feature (Income), then we are just going to show the result of the analysis for the remaining features.


Income

median_Income_0 <- median(Data_0$Income)
median_Income_0
## [1] 49724
median_Income_1 <- median(Data_1$Income)
median_Income_1
## [1] 64509
Density plot of the bootstrap (95% confidence) - median income for both responses

Results of the permutation test for Income

## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that Income could be a pontential feature for the model to make predictions.


MntWines

Results of the permutation test for MntWines

## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that MntWines could be a potential feature for the model to make predictions.


MntFruits

Results of the permutation test for MntFruits

## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that MntFruits could be a potential feature for the model to make predictions.


MntMeatProducts

Results of the permutation test for MntMeatProducts

## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that MntMeatProducts could be a potential feature for the model to make predictions.


MntFishProducts

Results of the permutation test for MntFishProducts

## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that MntFishProducts could be a potential feature for the model to make predictions.


MntSweetProducts

Results of the permutation test for MntSweetProducts

## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that MntSweetProducts could be a potential feature for the model to make predictions.


MntGoldProds

Results of the permutation test for MntGoldProds

## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that MntGoldProds could be a potential feature for the model to make predictions.

Permutation Tests for categorical nominal features with 2 possible values

In this test we are going to compare the counts for the Response (0 vs 1) to determine whether our findings are statistical significant or random chance.

To avoid making the analysis longer, we are just going to show the calculations for the first feature (AcceptedCmp1), then we are just going to show the result of the analysis for the remaining features.


AcceptedCmp1

We start by making a table with the counts.

## # A tibble: 4 × 3
## # Groups:   AcceptedCmp1, Response [4]
##   AcceptedCmp1 Response     n
##          <dbl>    <dbl> <int>
## 1            0        0  1768
## 2            0        1   244
## 3            1        1    79
## 4            1        0    62

Then we calculate the % of conversion for people who AcceptedCmp1 vs people who didn’t AccepetedCmp1, and we substract them.

obs_pct_diff_Cmp1 <- 100 * (79/141 -244/2012) #%conv1 - %conv2
obs_pct_diff_Cmp1
## [1] 43.90113
Results of the permutation test for AcceptedCmp1

Now we are going to determine whether this difference is statistical significant.

## [1] 43.90113
## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that AcceptedCmp1 could be a potential feature for the model to make predictions.


AcceptedCmp2

We start by making a table with the counts.

## # A tibble: 4 × 3
## # Groups:   AcceptedCmp2, Response [4]
##   AcceptedCmp2 Response     n
##          <dbl>    <dbl> <int>
## 1            0        0  1820
## 2            0        1   303
## 3            1        1    20
## 4            1        0    10

Then we calculate the % of conversion for people who AcceptedCmp1 vs people who didn’t AccepetedCmp1, and we substract them.

obs_pct_diff_Cmp2 <- 100 * (20/30 -303/2123) #%conv1 - %conv2 of response by accepted cmp
obs_pct_diff_Cmp2
## [1] 52.39441
Results of the permutation test for AcceptedCmp2

## [1] 52.39441
## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that AcceptedCmp2 could be a potential feature for the model to make predictions.


AcceptedCmp3

We start by making a table with the counts.

## # A tibble: 4 × 3
## # Groups:   AcceptedCmp3, Response [4]
##   AcceptedCmp3 Response     n
##          <dbl>    <dbl> <int>
## 1            0        0  1745
## 2            0        1   246
## 3            1        0    85
## 4            1        1    77

Then we calculate the % of conversion for people who AcceptedCmp1 vs people who didn’t AccepetedCmp1, and we substract them.

obs_pct_diff_Cmp3 <- 100 * (77/162 -246/1991) #%conv1 - %conv2 of response by accepted cmp
obs_pct_diff_Cmp3
## [1] 35.17526
Results of the permutation test for AcceptedCmp3

## [1] 35.17526
## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that AcceptedCmp3 could be a potential feature for the model to make predictions.


AcceptedCmp4

We start by making a table with the counts.

## # A tibble: 4 × 3
## # Groups:   AcceptedCmp4, Response [4]
##   AcceptedCmp4 Response     n
##          <dbl>    <dbl> <int>
## 1            0        0  1730
## 2            0        1   264
## 3            1        0   100
## 4            1        1    59

Then we calculate the % of conversion for people who AcceptedCmp1 vs people who didn’t AccepetedCmp1, and we substract them.

obs_pct_diff_Cmp4 <- 100 * (59/159 -264/1994) #%conv1 - %conv2 of response by accepted cmp
obs_pct_diff_Cmp4
## [1] 23.8672
Results of the permutation test for AcceptedCmp4

## [1] 23.8672
## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that AcceptedCmp4 could be a potential feature for the model to make predictions.


AcceptedCmp5

We start by making a table with the counts.

## # A tibble: 4 × 3
## # Groups:   AcceptedCmp5, Response [4]
##   AcceptedCmp5 Response     n
##          <dbl>    <dbl> <int>
## 1            0        0  1760
## 2            0        1   232
## 3            1        1    91
## 4            1        0    70

Then we calculate the % of conversion for people who AcceptedCmp1 vs people who didn’t AccepetedCmp1, and we substract them.

obs_pct_diff_Cmp5 <- 100 * (91/161 -232/1992) #%conv1 - %conv2 of response by accepted cmp
obs_pct_diff_Cmp5
## [1] 44.87515
Results of the permutation test for AcceptedCmp5

## [1] 44.87515
## [1] 0

The result is a p-value of 0, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that AcceptedCmp5 could be a potential feature for the model to make predictions.


Complain

We start by making a table with the counts.

## # A tibble: 4 × 3
## # Groups:   Complain, Response [4]
##   Complain Response     n
##      <dbl>    <dbl> <int>
## 1        0        0  1813
## 2        0        1   320
## 3        1        0    17
## 4        1        1     3

Then we calculate the % of conversion for people who Complain vs people who didn’t Complain, and we substract them.

obs_pct_diff_Complain <- 100 * (320/2133-3/20) #%conv1 - %conv2 of response by accepted cmp
obs_pct_diff_Complain
## [1] 0.002344116
#result is 0.002344116
Results of the permutation test for Complain

## [1] 0.002344116
## [1] 0.3161538
Observations for Complain

The result is a p-value of 41.1%, this means that in 41.1% of the time it can be a result of random chance. Therefore, the null hypothesis is proven and we conclude that this result is not statistical significant.

Therefore, we will not consider this features for the model to make predictions.

Chi-square Tests for categorical nominal features with more than 2 possible values

In this test we are going to compare the counts for the Response (0 vs 1) to determine whether our findings are statistical significant or random chance.

To avoid making the analysis longer, we are just going to show the calculations for the first feature (kidhome), then we are just going to show the result of the analysis for the remaining features.


Kidhome

We start by making a table with the counts.

## # A tibble: 6 × 3
## # Groups:   Kidhome, Response [6]
##   Kidhome Response     n
##     <dbl>    <dbl> <int>
## 1       0        0  1043
## 2       0        1   220
## 3       1        0   743
## 4       1        1   101
## 5       2        0    44
## 6       2        1     2
reaction_Kidhome <- matrix(Count_Kidhome$n, nrow=3, ncol=2, byrow=TRUE)
reaction_Kidhome
##      [,1] [,2]
## [1,] 1043  220
## [2,]  743  101
## [3,]   44    2
dimnames(reaction_Kidhome) <- list(unique(Data$Kidhome), unique(Data$Response))
reaction_Kidhome
##      0   1
## 0 1043 220
## 1  743 101
## 2   44   2
chisq.test(reaction_Kidhome, simulate.p.value=TRUE)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  reaction_Kidhome
## X-squared = 15.978, df = NA, p-value = 0.001499

The result is a p-value of 0.0009995, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that Kidhome could be a potential feature for the model to make predictions.


Teenhome

chisq.test(reaction_Teenhome, simulate.p.value=TRUE)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  reaction_Teenhome
## X-squared = 64.254, df = NA, p-value = 0.0004998

The result is a p-value of 0.0004998, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that Teenhome could be a potential feature for the model to make predictions.


Dt_Customer

For this particular feature, in order to perform the Chi-square test, I had to divide the Dt_Customer in periods of 6 months.

# Create a new column with the 6-month period
Data$Period <- cut(Data$Dt_Customer, breaks = "6 months")

# Count the number of dates in each period
table(Data$Period)
## 
## 2012-07-01 2013-01-01 2013-07-01 2014-01-01 
##        470        561        578        544
Count_Period <- Data %>%
  group_by(Period, Response) %>%
  count(Period, sort = TRUE) %>%
  arrange(Period) %>%
  return(Period)

Count_Period
## # A tibble: 8 × 3
## # Groups:   Period, Response [8]
##   Period     Response     n
##   <fct>         <dbl> <int>
## 1 2012-07-01        0   342
## 2 2012-07-01        1   128
## 3 2013-01-01        0   467
## 4 2013-01-01        1    94
## 5 2013-07-01        0   525
## 6 2013-07-01        1    53
## 7 2014-01-01        0   496
## 8 2014-01-01        1    48
reaction_Period <- matrix(Count_Period$n, nrow=4, ncol=2, byrow=TRUE)
reaction_Period
##      [,1] [,2]
## [1,]  342  128
## [2,]  467   94
## [3,]  525   53
## [4,]  496   48
dimnames(reaction_Period) <- list(Lista_periodos, unique(Data$Response))
reaction_Period
##              0   1
## 2012-07-01 342 128
## 2013-01-01 467  94
## 2013-07-01 525  53
## 2014-01-01 496  48
chisq.test(reaction_Period, simulate.p.value=TRUE)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  reaction_Period
## X-squared = 88.206, df = NA, p-value = 0.0004998

The result is a p-value of 0.0004998, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that Dt_Customer could be a potential feature for the model to make predictions.


Recency

chisq.test(reaction_Recency, simulate.p.value=TRUE)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  reaction_Recency
## X-squared = 98.154, df = NA, p-value = 0.0004998

The result is a p-value of 0.0004998, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that Recency could be a potential feature for the model to make predictions.


NumWebPurchases

chisq.test(reaction_NumWebPurchases, simulate.p.value=TRUE)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  reaction_NumWebPurchases
## X-squared = 20.326, df = NA, p-value = 0.0004998

The result is a p-value of 0.0004998, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that NumWebPurchases could be a potential feature for the model to make predictions.


NumCatalogPurchases

chisq.test(reaction_NumCatalogPurchases, simulate.p.value=TRUE)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  reaction_NumCatalogPurchases
## X-squared = 98.22, df = NA, p-value = 0.0004998

The result is a p-value of 0.0004998, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that NumCatalogPurchases could be a potential feature for the model to make predictions.


NumWebVisitsMonth

chisq.test(reaction_NumWebVisitsMonth, simulate.p.value=TRUE)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  reaction_NumWebVisitsMonth
## X-squared = 41.662, df = NA, p-value = 0.0004998

The result is a p-value of 0.0004998, this indicate a strong evidence against the null hypothesis, and therefore we conclude that this result is statistical significant.

This indicates that NumWebVisitsMonth could be a potential feature for the model to make predictions.

Anova Tests